BetterGedcom - What IS BetterGEDCOM

ttwetmore 2011-01-03T08:32:56-08:00

Thoughts on GEDCOM and Better GEDCOM

I was asked to put together some of my thoughts about GEDCOM and Better GEDCOM. Here is a link to the document I put together:

http://deadendssoftware.com/Thoughts.pdf

Tom Wetmore

louiskessler 2011-01-06T18:46:45-08:00

Andy:

Have you read Tamura's blog? He's been doing what you say for years. But they don't listen.
http://www.tamurajones.net/

Louis

ttwetmore 2011-01-06T18:52:43-08:00

It's great to be back on good terms with Louis. Our discussion may have resurrected some interest in taking more careful looks at GEDCOM, especially concerning the post-5.5 work done by the LDS, and whose documents some of us have scrounged away. Some of the relevant documents are available through the GEDCOM model page on this wiki. I have also archived them on the DeadEnds software domain. Here is the URL with links to all the docs:

http://deadendssoftware.com/

There's an old 5.3 spec in text format, and pdf's for 5.5 (the official version), 5.5.1, 5.6 (first foray into XML) and 6.0 (GEDCOM XML). There is also an interesting future directions document and a short letter describing the move to XML. These are all from the LDS. I have also included the CommSoft document on Event GEDCOM.

These are all required reading for anyone interested in having a legitimate say on what the Better GEDCOM model should include. Do your homework; there could be a pop quiz in the morning.

Tom Wetmore

Andy_Hatchett 2011-01-06T19:04:10-08:00

Louis,

Yes- I've read Tamura's blog- and I consider it the Gold Standard as far as software reviews go... bbut one Blogger alone isn't enough.

What I am saying is that the other major blogs- Myrt's, Eastman's, etc. have to become more like it. No more just posting press releases.

They have to actively work to build a sense of dis-satisfaction among the end-user community of ALL present genealogy programs so that the developers of those programs feel forced to make a change to meet the end-user's community's desires.

That is what will get BetterGEDCOM accepted. Only fear of loosing market will encourage the developers to make a change.

ttwetmore 2011-01-07T00:28:11-08:00

Myrt: Now I'd like to see a NEW WAY to actually arrive at conclusions.

I’ve never worked on a wiki consensus project before, so don’t have wisdom to offer. My previous work on projects of this size was done by committees of geeks organized by management. Important tasks were itemized and farmed out to sub-groups based on interest and experience, with each subgroup run by someone with responsibility for completing the tasks. Decisions made by subgroups were ratified by the full committees.

Myrt: I am beginning to think there are TOO many points being considered, and TOO many places to put things for discussion here on the wiki.

I don’t feel overwhelmed by the breadth of subjects being discussed. My only difficulty is keeping track of what discussions are going on and where. I track things by using the “recent changes” page, but it’s still hard because things go off screen fast. I do believe discussions need to be organized better. I feel there are three things at odds with one another – creating wiki pages, posting to wiki discussions, and posting to the blog. New information shows up all three ways. It doesn’t feel right to me that there should be both a wiki and a blog.

Myrt: I'd like your thoughts on how to narrow this down, so point by point we can get off the pot … I think more focused discussion on one topic, like we did during our meeting is called for... not sure how to implement it.

I think pure consensus is slow and difficult on all but the simplest questions. The conventional approach has been for a person or small group to put together proposals and present them to a larger group for comment and acceptance. It may be that the wiki revolution will replace this model, but I am a very old dog dealing with very new tricks. The question for me is will wikis cause people to self-organize into efficient problem solving enterprises that can do better and faster work than top-down efforts.

I do think the Better GEDCOM wiki has successfully elucidated the major areas that require attention, and the members are segregating themselves interest groups. So the wiki is working for problem identification. We don’t know whether the wiki will work for real problem solving though. For thorny problems (like the data model) I can’t see answers bubbling up out of consensus. There is some real work involved that some one or some group is going to have to do.

Tom Wetmore

hrworth 2011-01-07T05:18:25-08:00

Andy,

You are right about the Genea-Blogging Community. Dear Myrtle and I both have reached out to that community and we HAVE seen them "around". Beside Tamura, Randy Seaver was on the call on Monday.

So, we are trying to do exactly that and have since this project started.

Just to be a little clearer, we want THEM to help US understand the current thinking for the Genealogy Pro's "out there". We had a term for them, "The Big G". What might the approach that Mark Tucker has that the nationwide genealogy groups or working on. What, if any, impact on what they are doing could have on the BetterGEDCOM in the future. What do we need to consider, either now or later, based on the larger genealogy community actions. The Genea-Bloggers are a way to communicate this work.

This wiki and the blog have both been 'talked about' on a number of the Blogs already.

We'll get there.

Thank you,

Russ

AdrianB38 2011-01-07T06:59:37-08:00

"the other major blogs ... have to actively work to build a sense of dis-satisfaction among the end-user community of ALL present genealogy programs"

Um. Maybe I'm misreading your suggestions Andy, but it would be interesting to know if the major (and sane) bloggers really think they have that much power.

Firstly, bear in mind that most of those guys work out of the US and their consumption outside the US, or maybe the English-speaking world, is - >>percentage-wise<< not that great. Which means their impact on non-US software suppliers is not going to be great. (Though having said that, I reckon most non-US suppliers would fall into line and follow advances made by the US).

Secondly, anyone who has their "stuff" reviewed and blogged about on the internet will almost certainly hold the view that the _typical_ blogger is an anonymous, ignorant, conspiracy-theorist with the ambitions to be the next major shock-jock but neither the inventiveness nor the intelligence to make it. We do not, of course, have any typical bloggers on here! So if the esteemed bloggers do start trying to build dis-satisfaction among the end-user community of all present genealogy programs, I'm afraid that the typical response of these software suppliers will be that "Myrt EastMavenSider has joined the Dark Side".

Or let me put it another way - most of these guys are always mentioning how important citing sources is. So why aren't we all doing it?

I think the blogging community _has_ a part to play - but it has to be one with a positive spin on it - not "Look how rubbish this is..." but "Look how good this alternative is". And that probably means we need someone to concoct at least some demo / concept software, so getting the bloggers to upset all the suppliers seems a touch dangerous.

Like I said, maybe I'm misreading your suggestions Andy.

DearMYRTLE 2011-01-07T11:01:34-08:00

Darth Vader has just told me he is my father!

How shall I document that? ;-D

GeneJ 2011-01-07T14:44:03-08:00

Donno ... but I'm sure you will figure out a way and include a source, too.

Andy_Hatchett 2011-01-07T15:12:38-08:00

Adrian,

"Look how rubbish this is" and "Look how good this alternative is" are not mutually exclusive; indeed, they are, when used together, more effective than when used alone.

It really doesn't matter how much (or little) power the major (and sane) bloogers think they have. It is how much the software suppliers think they have...

And it is obvious that at least the major suppliers think they have some power or they wouldn't get the perks (paid rooms, etc.) that they get from said supplies when attending certain functions.

What I'm really saying is that it is going to take a major PR push to get the user community to really see the need for BetterGEDCOM, because until the user community really sees that need *AND EXPRESSES* the desire that the software suppliers fill that need, the software suppliers will have no reason to use BG to begin with- no matter how big a "hook" it has.

At least that is how I see the situation.

TamuraJones 2011-01-08T00:12:31-08:00

I find the suggestion that bloggers need to build dissatisfaction hilarious. There is no shortage of dissatisfaction with current products.
In fact, it is from that dissatisfaction that the BetterGEDCOM initiative arose.
Methinks the right goal is exactly the opposite; to build user satisfaction by guiding readers to good products and warning them against bad ones.
That is what real reviews do. That is what my annual GeneAwards do.

The notion that you need to expose "every little thing wrong" is nonsense. There are plenty of little things that matter little.
Then again, if a program has a major isue, say it crashes all the time or cannot read or write GEDCOM files, then you hardly need to discuss anything else.
There is little sense in discussing how great the charts look if you cannot even import your own data into it...

Bloggers are far from unimportant in forming public opinion or swaying vendors.
Think of bloggers as the voices for the silent majority of users. Bloggers certainly are among the voices most loudly heard by the vendors.
That you are heard can be put to good use. There are vendors that listen and improve their product in response to a review.

Andy says that bloggers need to get out of bed with the vendors. That may be true, but not only for bloggers...
Anyone/thing acting acts as "shill for the vendors" at the cost of users is part of the Dark Side. A genea Jedi uses the blogging force to defend the interest of the user.

Yes, Andy is right that being truthful may cost you a few perks, such as vendor sponsoring of or advertising on your blog, access to beta copies, invites to events or official blogger

badges. Several vendors/organisations prefer to deal with proverbial "yes-men", and do reward bloggers for parroting their company line, and try to bribe or punish you if you don't.
Question is, can you respect yourself in the morning?

Andy_Hatchett 2011-01-08T10:00:02-08:00

Tamura, you said:
" re: Thoughts on GEDCOM and Better GEDCOM
TamuraJones Today 2:12 am

I find the suggestion that bloggers need to build dissatisfaction hilarious. There is no shortage of dissatisfaction with current products.
In fact, it is from that dissatisfaction that the BetterGEDCOM initiative arose."

If such be the case, then it only becomes a matter of getting that dissatisfaction communicated to the software developers much more forcefully than has heretofore been the case.

We need to get the user community comminicating directly and in public about their dissatisfactions with each individual package.

Yes- I want to create a public relations nightmare for the software developers until they are forced to react by either accepting some new third party standard or all of them getting together and developing a new one that all major producers of genealogical software will agree to abide by in its totality.

Thus it is imperative that the entire end-user community be involved. I want to see them marching, torches held high and pitchforks waving!

:)

AdrianB38 2011-01-08T12:34:21-08:00

Andy - you said "'Look how rubbish this is' and 'Look how good this alternative is' are not mutually exclusive; indeed, they are, when used together, more effective than when used alone."

Absolutely agree with you - and it's important that we keep the positive element in the message as well. It's not just the negative communicating of dissatisfaction but also the positive pointing out of the alternatives. Not just the black clouds of a PR nightmare but the light at the end of the tunnel as well - that's what I want to remind us of. (Think I might be mixing my metaphors there)

hrworth 2011-01-05T03:53:11-08:00

GeneJ,

I do NOT think that we did any control testing on this, nor did you tell me that you saw or didn't see in TMG. I also did NOT say anything about what Family Tree Maker does or doesn't do, as we never got into any of that detail.

Thank you,

Russ

louiskessler 2011-01-05T17:16:00-08:00

See my blog entry:

Build a BetterGEDCOM or learn GEDCOMBetter?
http://www.beholdgenealogy.com/blog/?p=803

Louis

ttwetmore 2011-01-05T19:07:02-08:00

Louis,

I've read your blog entry and am trying hard to respect your opinions. I think you are wrong about GEDCOM, and you prove it to me by how you blame everything EXCEPT GEDCOM for all the problems that GEDCOM causes.

Your argument that because 99% of the industry uses GEDCOM that that makes it a good standard is laughable. Have you completely forgotten that every program that doesn't use GEDCOM as its internal database CANNOT fully import GEDCOM files and CANNOT fully export their databases to GEDCOM? None of these programs can round trip GEDCOM info. Are you really willing to claim that these applications support GEDCOM, and that GEDCOM supports the industry, and that this proves that GEDCOM is a good thing, when this fundamental and terrible thing is true? I guess you just blame the application developers for not doing it right and therefore can absolve GEDCOM of any responsibility. Have you ever had the thought that maybe this is because the data model that GEDCOM supports is just not complete enough for the needs of real genealogical applications? Your claim that all GEDCOM needs is more understanding and maybe a teeny bit of tweaking seems so naive to me that I am at a loss for words.

Your blog entry really disappointed me. You are on home ground and therefore can afford to take on a pontificating, know it all stance, but you are very off-putting. You have glossed over and minimized all of GEDCOM's faults, twisted all arguments around, and taken on a me against the whole world position as if you were some noble crusader in a world of plodding idiots. Plus you've called all of us naive at best and stupid at worst.

I believe you have some valid points, and I believe GEDCOM could be expanded into Better GEDCOM by leaving most of what's already in GEDCOM there and adding more data types and other concepts. I think all of the Better GEDCOM team thinks that as well. But it isn't minor tweaks as you contend; it's a major enhancement that will require at least a couple more record types. We need to add at least multi-role events, and we need to add the concept of an evidence person. Your argument that GEDCOM already has these two concepts is in my mind nothing more than grabbing at straws.

But at this point I'm just very disappointed and have no energy to continue; maybe we can pick it up again later.

Tom Wetmore

louiskessler 2011-01-05T20:47:54-08:00

Tom,

Fair enough. We have agreed to disagree for two decades. I respect your opinions and hope you find a way to respect mine.

However, I do have to rebut your 2nd paragraph: Somehow you think that GEDCOM is at fault and a standard can be created to replace GEDCOM which will allow every program that is using it to fully import, fully export and fully round-trip ALL its data. And you think that can happen without requiring that the programmers understand the details of that standard and follow that standard as it is written. ... Sorry. Don't think so.

ttwetmore 2011-01-05T23:51:32-08:00

Louis,

It's not somehow I think. I KNOW how GEDCOM is at fault. It simply doesn't support a full enough genealogical data model to meet the needs of genealogical applications. They can't round trip their data through GEDCOM because GEDCOM isn't "big" enough.

You say "And you think that can happen without requiring that the programmers understand the details of that standard and follow that standard as it is written"

Did you make that up, or do you really think that I or ANYONE could really believe a programmer could implement a standard without understanding it? Are you implying that Better GEDCOM will fail because it will be a standard? Or are you implying that GEDCOM didn't really fail, it was just all those dumb application developers who didn't understand it? You're making up something crazy and then implying that I believe it. That's a dirty trick. I would rather not argue that way.

Tom Wetmore

louiskessler 2011-01-06T06:35:05-08:00

Tom,

You're totally missing my point here.

Russ and the others at the BetterGEDCOM blog are not talking about programs mis-communicating because GEDCOM doesn't have an Event record, or because GEDCOM doesn't have an evidence-proof model. They're pointing out data transfer problems and problems with interpretation of basic information that is already defined in GEDCOM.

The programs today are not following the specs after it being available to them for 15 years. If they don't follow them now, and BetterGEDCOM gives them a new spec to follow, I don't see why they would all of a sudden decide to follow the new specs.

Every program has a different internal data structure. No model, whether GEDCOM or Better GEDCOM will ever match the data structure unless the data structure is adapted to follow the model. So no matter what, some work will have to be done by all developers to either properly implement GEDCOM, or properly implement BetterGEDCOM. If they don't feel it is a necessity, which today they don't, then this will not happen.

If the developers don't do their homework and understand the standard, then there will be mistakes made, and the data won't transfer or archive properly, now, or anytime in the future.

So I'm implying that any standard will fail if the users of the standard don't understand it and don't follow it properly.

ttwetmore 2011-01-06T08:25:36-08:00

Louis,

Thanks for the clarification.

I think there are two main points is the anti-GEDCOM camp.

First that GEDCOM is not strictly enough defined so that only one interpretation is possible.
Second that GEDCOM does not encompass a wide enough data model to meet the needs of modern genealogical applications.

I do know what Russ and others are doing on the blog, and you are right about that; they are addressing the first of the these. They are cataloging the various different interpretations and yes, misinterpretations, found in today's applications when processing GEDCOM, as a way of anticipating the problem areas that will have to be carefully addressed by Better GEDCOM.

I understand your argument. You believe the entire (well most of it anyway) problem with GEDCOM is that the developers of applications didn't understand it well enough to implement it correctly. You don't buy the argument that it was the sloppiness of the GEDCOM specification itself that caused the misinterpretations. I believe both points are right, but I think the sloppiness point is the major one, and I believe you think the the sloppiness is really misinterpretation that can be easily fixed. However, I think you are falling into the trap of believing that your interpretation is the correct one and that if everyone would just have interpreted it your, correct way, there would be no problems.

The second point is the point that interests me. The stuff that Russ et al are doing is interesting, but the real problem in my mind is the proper genealogical data model. Here we differ also. You believe the GEDCOM model is almost adequate for the enlarged model, and that only some small modifications are necessary to reach that point. I think the GEDCOM model is far from that state so it makes more sense to, not start from scratch really, but to accept that the changes will be so big that a real effort is involved.

I believe that the enhanced GEDCOM could use the GEDCOM syntax, and could reuse much of the GEDCOM tags and semantics that currently exist. As you have probably become aware I believe the final syntax is a trivial after thought. What I’ve learned over the years, however, is that people often avoid the hard problems by thrashing around with trivial things; it makes them feel like they are getting something done while the difficult stuff is cooking away on the back burners of their minds. Not to step on too many more toes, but to me all the arguments about archiving and long-term storage, and should we pick XML or JSON, are examples of being detoured into trivial matters that avoid the big ones. If Better GEDCOM did as you suggest, firm up GEDCOM, specify the tags and values strictly and formally, add tags and record types to enhance the model, I would get behind this. I would see that as an excellent Better GEDCOM process. But all the XML’nics will come out of the woodwork. As I said I got out of the way of that locomotive.

Your point about why would a developer follow a new spec if they won't follow an old spec is a great question, and it is the question that I called the "elephant in the living room" in my original document. Developers will only implement a standard if they feel there is something big they gain from it. I do not share the optimism on this wiki that Better GEDCOM will provide that incentive. I believe that Better GEDCOM is doomed to failure because of this, and that we will still be dealing with GEDCOM import and export for the next fifty years. I gave up on a Newer GEDCOM more than a decade ago when I saw the useless work the GenTech team did and when my own analysis of the problem led to the conclusion that without a major monetary incentive, the genealogical application cottage industry would never band together to support a new standard. And there simply is simply no incentive. And giving you credit where credit is true, there really isn't that much incentive to get GEDCOM import and export working all that well or correctly either!

I am in this effort simply because I think it is a wonderful problem that is very interesting, and that if WE COULD solve it WE COULD make a big difference. There is at least a tiniest sliver of chance that we could be successful. For the past twelve years I have been totally content to dream about my DeadEnds model, and implement bits and pieces of it as time allowed, without any hopes or plans that it would revolutionize genealogical software. I think you and I are exactly the same in this. You have poured your soul and your philosophy into Behold, and I into LifeLines and DeadEnds. We are both highly skilled software professionals and highly analytic genealogists. That led both of us immediately to create programs that used GEDCOM as their databases, and to allow GEDCOM extensions to solve all problems of representation that we encountered. I think we are both nearly at the same point. We both know that the GEDCOM model needs enhancement to meet the needs modern applications. You think a relatively small change to GEDCOM is needed while I think a relatively large one is.

The big difference between us is how we see the second main anti-GEDCOM point. You see it as small and easily fixed with some tweaks. I see it as a major issue and the raison d’etre of the whole Better GEDCOM effort.

Enough. Super glad we are back on a personal even keel!

Tom Wetmore

SandyRumble 2011-01-06T11:20:00-08:00

I think elements of both arguements are correct.
Why was the original GEDCOM specification created, ie. what was its goal? As I understand it, it was for electronic submission of temple data (~1985).
As the PC evolved and genealogy management software tools evolved, the GEDCOM was still a way to submit your family tree data AND users began to use it to exchange data between researchers. The use of the file evolved (as typically happens in the computer world).
The original intent was temple submissions leading to users asking (forcing) vendors to support the specification. No support, I don't buy your product.
Fast forward to 1996, the Family History Dept says the specification is as is, resulting in a loss of backing from the main supporter removing the motivation for the support of the GEDCOM functionality.
Fast forward to 2011. Vendor support for GEDCOM has eroded due to lack of a mandating need for the user community, resulting in less compliance with the specification. The flexibility in genealogy management programs has exploded as more hobbyists are flooding into the area, but most without a desire to submit temple data. Mix in the changes to the Family Search that we've been hearing about, putting less emphasis on the need for GEDCOM's, and you're left with where we are today.
Users are using GEDCOM's to share (insert your word preference here) data AND to shop genealogy programs. Looking at the broader genealogy market, not just the power researchers, fewer users are using it to submit temple data. Over the past several not only has vendor compliance dropped, but the new features in their programs does not include this data being recorded in the GEDCOM file; sometimes it is included, but not in a compliant manner. Why? To protect their customer base! People are much less likely to jump vendors/products if they can't take all of their data!
This is the true problem, GEDCOM files that are not compliant because users aren't requiring it!. The WHY is key to the BetterGEDCOM effort. We can design a better specification, whether we correct deficiences in teh current model or we build a new one, but the bigger issue is the vendors have no reason to support it.

Sandy Rumble

louiskessler 2011-01-06T11:40:00-08:00

Tom,

I think you've summed everything up perfectly.

I think its important that both of us participate in BetterGEDCOM and can represent somewhat differing points of views. It will help BetterGEDCOM become a more open model. It would be nice if there were few other developers that would join us and add even more flavour to the pot.

And maybe my definition of "tweak" is different than yours. I would consider changing Events to be an entity to be a tweak. I would consider merging an evidence/conclusion system into the GEDCOM to be a tweak. I would consider adding new Tags or moving tags around to be a tweak. These are all things that could bring a 5.5 version up to a 5.6 or even a 6.0. Anything that builds upon the current GEDCOM philosophy and evolves it would be a natural evolution to me.

Actually, and I admit I haven't looked at it in detail, I might prefer to start with the model behind the GEDCOM XML 6.0 spec, because that was the last effort that the Family History guys left us with that is the most advanced, but has the original philosophy still embedded. I truly believe their originaly philosophy was solid and originally well-thought out.

Let's now see this go forward, together.

Louis

testuser42 2011-01-06T13:00:36-08:00

Louis, as far as I can tell, Tom is also very much building his ideas on current GEDCOM. Everyone is, I think, since GEDCOM got a whole lot of the basics right. GenTech, on the other hand, seems to have tried to start from scratch and do everything different just for the sake of it... and that didn't work out so well.

DearMYRTLE 2011-01-06T17:15:21-08:00

Tom and Louis, et al.
I appreciate the time you've put into just this portion of the conversation. Now I'd like to see a NEW WAY to actually arrive at conclusions.

We want to make that more than a "sliver" of a chance for BetterGEDCOM to succeed. It isn't BetterGEDCOM's success per se, it is really the end-user.

I am beginning to think there are TOO many points being considered, and TOO many places to put things for discussion here on the wiki.

I was amazed by the ability to agree during our Developers' meeting this past Monday. It took a little moderating, but it did get done.

I'd like your thoughts on how to narrow this down, so point by point we can get off the pot.

You all, Tamura and others have been thinking along these lines for years and years. I am happy to see you all in the same sandbox.

I believe we can overcome the challenge to developers "buying in" to this, IF the buying public demands better performance.

Are we to relegate the end users to a something like a "microsoft only" world when it comes to genealogy software... meaning just because you started with XYZ genealogy program you have to stay with it? I sure hope not.

I think more focused discussion on one topic, like we did during our meeting is called for... not sure how to implement it.

Respectfully submitted.

Andy_Hatchett 2011-01-06T18:16:08-08:00

Dear Myrt,

The only way that the buying public is going to demand better performance is if the Blogging community really starts a long term effort of telling them to demand better performance from their software providers- and I don't mean in wishy-washy words.

The only way to do that is to really expose the flaws in the present programs and start telling the public they deserve better.

No more glowing reports of new features, bells, and whistles but rather a scathing expose of every little thing wrong in a program.

In other words, The Blogging community has to get out of bed with the software vendors and get into bed with the User community or admit to the public that they are nothing but shills for the vendors.

Yes, it will cost them a few perks, some doors will be shut to them- but in the long run the end-users will benefit.

Like it or not it really is an us vs. them world where this issue is concerned.

testuser42 2011-01-04T13:41:01-08:00

Thank you!
IMHO, this is a good overview and sound reasoning. I guess there's not much anybody could disagree with.

louiskessler 2011-01-04T21:36:09-08:00

Tom,

Thank you for your excellent presentation about your thoughts regarding GEDCOM and BetterGEDCOM. I think that is beneficial to everyone to read and it was very thought provoking for me. I think I would like to comment on many of your points to express what is my view which (as you and I and probably everyone here know already) is somewhat opposed to yours, and that is good, since together we'll make sure that BetterGEDCOM, if it does evolve, will handle diverse challenges from others. In advance, I do apologise for the length of this post, but I want to follow Tom's structure and go into some detail.

Introduction:

Just wondering if you worked with Ken Thompson at Bell Labs - a brilliant man and one of the fathers of Unix. I know him from his chess program BELLE and met him in 1978 in Washington, D.C.

Just to give a bit of a background of where I'm coming from, I've got a BSc Honours in Statistics and an MSc in Computer Science. I'm now working as the Manager of the Electric and Gas Forecasting Department at our Province's power utility. I do try to keep my worklife and my genealogy software development as two separate lives. I've been doing my genealogy for over 30 years and I've also been programming for over 30 years.

Behold:

I started developing my genealogy program Behold about 15 years ago, but my day-job and other outside involvements have made its development slow going. The main thing is that my program has started out as a genealogy data browser. As such, I have always used GEDCOM as its input medium. I just looked and I have almost 600 GEDCOM files that have been output from 100 different programs and versions of programs.

So my program reads GEDCOM files. But in order to be flexible, I really say that it reads what I call "Extended GEDCOM". My program is built to input and interpret as best as possible, any sort of GEDCOM junk that there may be out there. So I have seen all the flavours of GEDCOM and the many and varying ways in which developers have exported their data and how they've made use of GEDCOM, both correctly and incorrectly.

GenTech:

I agree with you totally about it being "complex and estoteric" and it had no chance from the start as a practical data model that anyone would want to adopt. Too bad. There were great minds involved in that.

BetterGEDCOM:

I am probably alone in this view, but my understanding of GEDCOM tells me this: I say "GEDCOM is NOT as bad as everyone thinks." It actually has many features that are built in that have sort of stayed under the radar and never got adopted or used by many genealogy program developers. But they really were forward thinking, and it was the simplicity of the programs at the time that probably prevented these features from being used, because they were, and are in fact NOT absent from the GEDCOM spec. I will highlight some of them below as I go through the items.

Problems with GEDCOM:

GEDCOM is Restricted to a Simple Model of Genealogy

"The information that can be conveyed ... is restricted to simple information about persons and families. ... Any other information must be relegated to unstructured notes."

Sorry, no, I really disagree. GEDCOM has its 100 or so TAGS, but that is not the issue. It is that GEDCOM allows for extensions. It does this in a few different ways.

1. User-defined Tags, aka Custom Tags. These are tags with a leading underscore. e.g. _URL, _EMI, _HAM, _HAN, _MVR, _OBT. Many programs do take advantage of these and they are in many of the GEDCOMs that I have. In earlier GEDCOM versions, the developers included a SCHEMA definition in the header record as a mechanism for creating user-defined tags. This would help define their meaning. However, it was deemed too complex for most programmers to want to implement and was removed from GEDCOM version 5.4. Some programs, most notably Family Tree Maker, still export the SCHEMA definition in the header (wrongly claiming to be GEDCOM 5.5 while doing that) and use it in the most trivial way that makes you wonder why they bother.

But the idea here is that the User-defined tags are available in current GEDCOM. I like the concept. All that would be really necessary here would be to define a set of these tags to extend the language and give them meanings that all programs can decide to honor. After all, using _URL as a URL makes sense, doesn't it?

I don't feel we should try to define every tag possible. You can't do it. Someone will always think of one more. I like the GEDCOM way, that defines a minimal set and allows user-defined extensions.

2. TYPE and RELA tags. These are tags in GEDCOM that are underutilized. Very few programs include them, but they are a very powerful feature of GEDCOM.

The TYPE tag classifies its superior tag for the viewer. I've seen it used most often to extend the EVEN (event) Tag. The example from the GEDCOM definition is:

EVENT_DESCRIPTOR:= {Size=1:90}
Text describing a particular event pertaining to the individual or family. This event value is usually assigned to the EVEN tag. The classification as to the difference between this specific event and other occurrences of the EVENt tag is indicated by the use of a subordinate TYPE tag selected from the EVENT_DETAIL structure. For example;

1 EVEN Appointed Zoning Committee Chairperson
2 TYPE Civic Appointments
2 DATE FROM JAN 1952 TO JAN 1956
2 PLAC Cove, Cache, Utah
2 AGNC Cove City Redevelopment

EVENT_OR_FACT_CLASSIFICATION:= {Size=1:90}
A descriptive word or phrase used to further classify the parent event or attribute tag. This should be used whenever either of the generic EVEN or FACT tags are used. The value of this primative is responsible for classifying the generic event or fact being cited. For example, if the attribute being defined was one of the persons skills, such as woodworking, the FACT tag would have the value of `Woodworking', followed by a subordinate TYPE tag with the value `Skills.'

1 FACT Woodworking
2 TYPE Skills

This groups the fact into a generic skills attribute, and in particular this entry records the fact that this individual possessed the skill of woodworking. Using the subordinate TYPE tag classification method with any of the other defined event tags provides a further classification of the parent tag but does not change the basic meaning of the parent tag. For example, a MARR tag could be subordinated with a TYPE tag with an EVENT_DESCRIPTOR value of `Common Law.'

1 MARR
2 TYPE Common Law

This classifies the entry as a common law marriage but the event is still a marriage event. Other descriptor values might include, for example,`stillborn' as a qualifier to BIRTh or `Tribal Custom' as a qualifier to MARRiage.

but it is also used to describe Multimedia Formats e.g:

n @XREF:OBJE@ OBJE {1:1}
+1 FILE <MULTIMEDIA_FILE_REFN> {1:M} p.54
+2 FORM <MULTIMEDIA_FORMAT> {1:1} p.54
+3 TYPE <SOURCE_MEDIA_TYPE> {0:1} p.62

Reference numbers:

+1 REFN <USER_REFERENCE_NUMBER> {0:M} p.63, 64
+2 TYPE <USER_REFERENCE_TYPE> {0:1} p.64

Name types:

PERSONAL_NAME_STRUCTURE:=
n NAME <NAME_PERSONAL> {1:1} p.54
+1 TYPE <NAME_TYPE> {0:1} p.56

and this is found elsewhere as well. It is a very underutilized tag.

The RELA (Relationship) tag, also very underutilized. It is usually used with the ASSO (Association) tag.

Other associations or relationships are represented by the ASSOciation tag. The person's relation or association is the person being pointed to. The association or relationship is stated by the value on the subordinate RELA line. For example:
0 @I1@ INDI
1 NAME Fred/Jones/
1 ASSO @I2@
2 RELA Godfather

The one restriction is that the association pointer only associates INDIvidual records to INDIvidual records. But that may be adequate as I have had trouble thinking of concrete examples where you would necessarily need anything else that cannot be handled in some other way.

The RELA tag was only added in GEDCOM 5.4 so earlier programs could not have used it.

The User-defined tags along with the TYPE and RELA tags will allow other information to be identifiable. And because of this, other information will not be relegated to unstructured notes. Thus, I do not feel that your "GEDCOM is Restricted to a Simple Model of Genealogy" is fully true.

Since this is an often overlooked feature, most people don't realize it exists. The model is general and not rigorously defined, but it would not be a major task to define the meanings of a set of values of these tags so that they can be used in a consistent way.

My program Behold is designed to read and present these tags as best as possible. When I add editing in Version 2.0, assuming GEDCOM is still around, I was thinking of making extensive use of them. The only problem with that is that few other programs probably would be able to properly read the GEDCOMs my program produce, and all these special tags would be relegated to the unstructured notes that are not GEDCOMs fault, but the software's fault.

GEDCOM Records Hold Conclusions:

You say "there is no facility in GEDCOM to transport the evidence behind the research that led to the conclusions." I don't agree with that at all. There is a good way, and I have seen it in some GEDCOMs, but it is not used very often.

Let me take your example: "If birth information for a person were extracted from a number of sources, say by estimation from census records, from the inscription on a gravestone, and from an official birth record, there is no easy way to include this information in a GEDCOM Person record." The way this is done is by repeating an event tag within the Person record, e.g.:

0 @I55@ INDI
1 NAME Fred /Flintstone/
1 BIRT 3050B.C.
2 SOUR @S10@
3 PAGE 15
3 QUAY 2
1 BIRT 3052B.C.
2 SOUR @S20@
3 QUAY 1
2 SOUR @S30@
3 QUAY 1
...
0 @S10@ SOUR
(census record)
0 @S20@ SOUR
(gravestone inscription)
0 @S30@ SOUR
(official birth record)

In the above example, the census record said 3050 BC and had a QUAY (Quality) of 2. The gravestone inscription and birth record both said 3052 BC, but they had lower QUAYs of 1 and 1.

There are three items of evidence here giving two hypothesis. There is no conclusion here, although usually the first event should be considered the most plausible one by the genealogist in charge. Is one record of QUAY 2 better than two of QUAY 1 or 1? I don't know, but the point is that the information is clearly here, and the evidence is provided in a way that CAN be used for coming to conclusions.

Your comment that "At best the evidence must be put into unstructured NOTE lines in GEDCOM records" is not true, because it can easily be expressed as I've shown above. Some programs actually do this. They are just so gawd-awful horrible at displaying this in a reasonable manner, that people don't realize the capability is there.

So no. GEDCOM records don't hold only conclusions. They can contain all the possibilities. This is available in GEDCOM now. It is not a weakness of GEDCOM. It is a weakness of the programs that don't use it.

GEDCOM Has No Multi-Role Events:

Sorry. Again I have to disagree. They have it, but they do it differently than you might think it should be done. It is not under the Event that they include the role. It is under the source-citation for the event. Might not seem logical at first, but let me explain:

What GEDCOM has is a ROLE tag.

SOURCE_CITATION:=
n SOUR @<XREF:SOUR>@ {1:1} p.27
+1 PAGE <WHERE_WITHIN_SOURCE> {0:1} p.64
+1 EVEN <EVENT_TYPE_CITED_FROM> {0:1} p.49
+2 ROLE <ROLE_IN_EVENT> {0:1} p.61
+1 DATA {0:1}
+2 DATE <ENTRY_RECORDING_DATE> {0:1} p.48
+2 TEXT <TEXT_FROM_SOURCE> {0:M} p.63
+3 [CONC|CONT] <TEXT_FROM_SOURCE> {0:M}
+1 <<MULTIMEDIA_LINK>> {0:M} p.37, 26
+1 <<NOTE_STRUCTURE>> {0:M} p.37
+1 QUAY <CERTAINTY_ASSESSMENT> {0:1} p.43

ROLE_IN_EVENT:= {Size=1:15}
[ CHIL | HUSB | WIFE | MOTH | FATH | SPOU | (<ROLE_DESCRIPTOR>) ]
Indicates what role this person played in the event that is being cited in this context. For example, if you cite a child's birth record as the source of the mother's name, the value for this field is "MOTH." If you describe the groom of a marriage, the role is "HUSB." If the role is something different than one of the six relationship role tags listed above then enclose the role name within matching parentheses.

ROLE_DESCRIPTOR:= {Size=1:25}
A word or phrase that identifies a person's role in an event being described. This should be the same word or phrase, and in the same language, that the recorder used to define the role in the actual record.

EVENT_TYPE_CITED_FROM:= {SIZE=1:15}
[ <EVENT_ATTRIBUTE_TYPE> ]
A code that indicates the type of event which was responsible for the source entry being recorded. For example, if the entry was created to record a birth of a child, then the type would be BIRT regardless of the assertions made from that record, such as the mother's name or mother's birth date. This will allow a prioritized best view choice and a determination of the certainty associated with the source used in asserting the cited fact.

Most of all, you've got to realize that this is a very powerful structure that to me is almost capable of anything. I've rarely, if ever, seen it used to its full capability. Using an example of my own making, based on the example just above, I get:

0 @I99@ INDI
1 BIRT
2 NOTE The source documents the person's birth event
2 SOUR @S40@
3 PAGE 12
3 QUAY 3

0 @I101@ INDI
1 NOTE The source here documents the maiden name of the mother at the birth
1 NAME mother /Maiden/
2 SOUR @S40@
3 PAGE 12
3 EVEN BIRT
4 ROLE MOTH
3 QUAY 2
1 BIRT
2 DATE 24 OCT 1950
3 NOTE The source here documents the mother's birth date
3 SOUR @S40@
4 PAGE 12
4 EVEN BIRT
5 ROLE MOTH
4 QUAY 3

0 @I199@ INDI
1 NAME Dr /Quack/
1 NOTE The source here documents the name of the doctor at the birth
1 SOUR @S40@
2 PAGE 12
2 EVEN BIRT
3 ROLE Doctor
4 QUAY 4

Only one Role is allowed per source-citation, but they are all referring to the same event. This example is a single source-citation (a birth record) defining a single event (the birth) with two different roles for the birth event, the MOTHer and the Doctor, and the MOTHer role is used for two different facts for the same person. And QUAYs that can vary for each one.

So to say these things are NOT in GEDCOM today is totally incorrect. To say that they are basically unused is correct. Maybe you can say they are difficult to use, but unless you try, will you really know? Maybe you can say it can be improved. But I don't necessarily think they are that bad. The fact is that the capability is there.

I'm sure these things need some tweaking, but I conclude that the original GEDCOMers really did a reasonable job thinking GEDCOM through. They've got the essence of an evidence-conclusion model built in, and their QUAY certainty assessment value was wonderfully simple, with the following values:
0 = Unreliable evidence or estimated data
1 = Questionable reliability of evidence (interviews, census, oral genealogies, or potential for bias, for example, an autobiography)
2 = Secondary evidence, data officially recorded sometime after event
3 = Direct and primary evidence used, or by dominance of the evidence

Solving GEDCOM Problems and Better GEDCOM

GEDCOM and XML:

I agree with everything you say.

Extend and Formalize the Sets of Tags Allowed:

As I mentioned above, I like GEDCOM's user-defined tags and the use of the TYPE and RELA tags. Other programs may not "understand" what the tags mean, but the TYPE and RELA tags are descriptive text and data can easily be grouped by these things. It is structured in a defined way. It's not as good as specific defined tags, but its better than organized notes. I'd sooner have fewer tags defined (maybe how many GEDCOM has now), but Tom has expressed his different idea. My worry is it will never be complete. People will always want new tags. So my thinking it that we should cover 99% of the cases, and let the other 1%, the rare ones, go free.

Evidence Based Person Records

As I mentioned above, I think GEDCOM already allows this.

Multi-Role Event Records as Both Evidence and Conclusion

... and this as well.

Place Records

I agree with Tom that Places need to have their own record. However, I like the comma delimited format of the hierarchy of the place in the name as I have argued on the BetterGEDCOM forums. And then there's my pet thing, which is to allow events on Places, so that you can document events in your ancestral town and have a place to attach it. I've argued that in the forums somewhere as well.

Other Records

I think we all have different pictures in our minds when we talk Sources and Citations and Repositories.

To me a citation is the SOURCE_CITATION structure I gave above.

The GEDCOM standard itself states under the description of the SOURCE_CITATION that:

"Data that allows an assessment of the relative value of one source over another for making the recorded assertions (primary or secondary source, etc.). Data needed for this assessment is data that would help determine how much time from the date of the asserted fact and when the source was actually recorded, what type of event was cited, and what type of role did this person have in the cited source."

Hey. When I read that, it sounds like they were trying to allow for an evidence/conclusion model, at least in a primitive way.

There is actually a lot that can be contained in it, including the multimedia information, text from the notes, and notes.

I am in favor of making this SOURCE_CITATION into its own record, as follows:

0 @C400@ CITE
1 SOUR @S40@
1 PAGE 12
1 TEXT blah, blah, blah

The reason is that I've encountered in the GEDCOMs I've dealt with, the same source-citation used multiple times, sometimes thousands. It makes sense, because one birth event documented in one place can give multiple pieces of information, which was also shown in my examples earlier. I don't know what isn't obvious about this and why others see the citation or source-citation so differently. Maybe I'm completely wrong on it, but I don't think so.

With regards to Sources, I don't like the idea of building all of Elizabeth Shown Mills into GEDCOM or BetterGEDCOM. Hers is her recommended style. It's not a standard. There was Richard Lackey before her, and there will be others after her. We should just have a general enough system to provide the template needed to allow any type of source. Let the programmers implement the specific systems if they want. I think GEDCOMs source records are already flexible enough to handle just about any source type.

I don't mind an Event record. But if we have it then only one way! (By now, you know that "only one way" is an important thing for me). So in that case, we want something like this:

0 @E400@ EVEN
1 TYPE BIRT
...

and then references to it must ALWAYS be:

n EVEN @E400@

with no inline events allowed!

Once again, the bottom line is that I really don't think GEDCOM is that bad. I think we should fix just what has to be fixed, add just what has to be added, and come out with a GEDCOM 5.6 or 6.0 that is better than the current and has just what we need it to have.

But I don't want everyone going off and writing brand new models that will lose some of the well-thought out original concepts that GEDCOM had. I want to hear a good reason for every change. I am open minded and want the best for BetterGEDCOM. I want it to be better on every front. Changing a wonderfully thought-out comma delimited place scheme that is simple and elegant to a hierarchy scheme with multiple records joined together will add complexity - so there better be tangible benefits for doing so.

Sorry if I sound like I'm ranting and raving. At least you know I'm passionate. I'd like to see a BetterGEDCOM that's truly "better" and has a chance of being adopted by the big guys.

And I'm also sorry for making you take a half an hour to read this. But think of the 4 hours it took me to write it.

Louis

GeneJ 2011-01-04T21:50:23-08:00

These two overviews are excellent. Thank you. --GJ

ttwetmore 2011-01-05T01:49:04-08:00

Louis,

Your long and excellent reply deserves its own document as a statement of your thoughts. I hope they don't get lost as just a response to me.

I would like to respond in detail, but I don't have another four hours right now, but a few quick thoughts.

I like GEDCOM as a syntax and believe that it could be extended into Better GEDCOM. As I've said a few times it is isomorphic to XML and therefore equivalent in expressive power. However, XML is a big railroad train and I no longer want to stand in front of it. To be frank what I REALLY want is our own special syntax for expressing genealogical data. In the "olden" days any application that dealt with specialized knowledge had its own special syntax for expressing that knowledge. Genealogy is definitely an application with specialized knowledge, and in my opinion deserves its own language. So my first choice for BG syntax would be a custom syntax; my second choice would be GEDCOM syntax; my third choice would be JSON; and my fourth choice would be XML. And I have already accepted in my heart that BG will be going with my fourth choice. Oh, well. Since they are all isomorphic I can write my own converters back and forth between all four and be a happy camper.

For LifeLines I don't worry about GEDCOM tags. LifeLines reads any GEDCOM tags at all and never changes anything about any of the records it reads. It uses the lineage-linked tags as defined in 5.5, the 1 NAME, 2 PLAC, 2 DATE, and many conventional event tags, for display and user interface issues, but is otherwise hands off.

I do disagree with a number of your disagreements with me, but that is not surprising. I'll just quickly list a few. Though first I will say that I am not disagreeing with anything about the GEDCOM syntax, but with the GEDCOM 5.5 standard. That is, if Better GEDCOM says let's stay with GEDCOM syntax I wouldn't care in the least. After all, it is my second choice for syntax.

But, given that, I do believe GEDCOM 5.5 is as bad as we all, except you, think.

First on tags. There are three "kinds" of tags:

1. Those defined by a standard, eg., 5.5.
2. Those application-defined extension tags.
3. Those user-defined extension tags.

You don't seem to mind types 2 and 3. I dislike them heartily. They really can't be shared well. Type 2 can be shared only by users with the same application, but with no one else. Type 3 tags can't be shared by anyone. All one can hope for when extension tags are used is that the importing program won't throw them away. It certainly won't know what to do with them. Users won't be able to see their values, change their values, understand what their values mean, etc, unless the program provides a special user interface that allows users to directly edit the offending GEDCOM structures.

So I think we should put as many tags into BG as possible, trying to anticipate as much as we can, so that extension tags, which I do believe are still necessary, are kept to a minimum. I get the feeling we are irrevocably opposed on this issue, so I won't say more.

I understand your like for the TYPE tag in GEDCOM and agree with you, and believe it can be used to reduce the need for extension tags.

Your claims that GEDCOM is not restricted to a simple model ring false with me over and over.

First you appeal to extension tags and user custom tags to solve this. For me this is just another major problem and not part of a solution.

Your comments about TYPE and RELA I agree with. RELA is used with ASSO and allows the setting up of arbitrary relationships between people. I have pointed out in the past that this is a good thing, and I put the same structure in the DeadEnds model under the "relation" attributes. So I do agree with all you say about the TYPE tag and how it is underutilized in GEDCOM and how it could solve a number of the extension problems. Without going into great detail, let me also say, that I don't believe it is the panacea that you may think it is.

As far as evidence is concerned I fully understand the idea of putting multiple BIRT/whatever events in an INDI as a way to attempt to keep your evidence, but this doesn't do it for me. I want evidence person records themselves, not a conclusion person that simply binds together a bunch of vital events that come from different sources. We may have a fundamental difference here, as you are willing to call the bound together set of vital records the evidence, whereas I am not so willing. For your system to work, as you put in your examples, you have to consider the referred to SOUR records as the evidence. I don't believe this is the correct use of the SOUR record concept, but I can accept it. So I'll grant you you get evidence into GEDCOM, but for me, not in a truly useful way. I want to be able to manipulate those bound together vital events as objects in their own right. Enough for now, I won't go into that. (Well, to be completely honest with you, I do believe we can use SOUR records as pure evidence. I have often thought of having a separate record type, Evidence, but usually end of considering an Evidence record as nothing other than a "leaf" on a tree of Source records. So I am twisting things a bit in my disagreement with you here -- I actually think you are making an excellent point, but since I'm supposed to be disagreeing here, I really can't say that -- whoops...)

As far as events go, I still say GEDCOM does not support multi-role events. Your appeal to the fact that ROLE info can be put inside EVEN structures inside SOUR records doesn't do too much for me, and I fully understood their existence before reading your note. The multi-role event is hidden so far away from view that it is near useless. The multi-role event is one of the TWO KEY CONCEPTS in genealogy, and shoving it out of sight as GEDCOM 5.5 does is so wrong there is little point in debating it. The multi-role event needs a life of its own, as it would have had in Event GEDCOM. So I'll grant you that GEDCOM 5.5 has a feature than can be called a multi-role event, but it is so poor that to me it is worthless. We need an EVEN record of its own, something I support in LifeLines.

All that being said, however, your detailed example is VERY interesting and shows what can be done when a user has complete control over their GEDCOM text. This is exactly why I wrote LifeLines and I also do a number of things very similar to what you are doing in your example.

But you are, in my opinion, jury rigging GEDCOM so that you can have a SOUR record represent a single evidence event (which I've admitted is okay), and then point back to these "events". I applaud this and think it excellently captures a lot (if not ALL) of the multi-event issue. But can't you see how jury-rigged it feels? The fact that only you do it this way, the fact that other software vendors aren't clever enough to even see the possibilities inherent in this, should be a big red flag, that this is not the right way to do this. The multi-event record must be embraced as a major entity of its own, on the same level as that as a person, in order for this concept to become understood and used as it should be. In your approach you have created multi-person events, but they are things that are "only" substructures within a few carefully crafted SOUR records. And the only reason you are able to do it this way is that your program gives you the low level access to your GEDCOM to allow it to happen. Almost no other application out there is willing to give their users this kind of control. My LifeLines does it. John Nairn's GEDitCom does it, and probably one or two others do it, but you have to admit that only expert users of these very flexible programs have it in their power to do this. You don't see this as a problem with GEDCOM, as you appeal to how GEDCOM COULD be used to support these concepts. You believe that just by pointing out these jury-rigged ways of doing things should be enough to say we can just put a little polish on GEDCOM and it will all be done. I simply can't see it that way. If the EVEN record is not pulled out to a level 0 concept I believe Better GEDCOM is already doomed.

GEDCOM uses the term "citation" completely wrongly. A citation is nothing other than the STRING that appears in a footnote or in a bibliography entry as a formal, structured, properly formatted description of where evidence came from. It is the STRING that is created by using templates as defined in E. S. Mills stuff and other "style manuals". It is something that must be DERIVED from a SOUR record (or in my preference, a chain of SOUR records), and certainly isn't a SOUR record on its own. It's too bad we all have different pictures about this, because there really is a true picture, and there really shouldn't be any arguments about it. We should simply decide how to accommodate the concepts in BG records. I have little patience with arguments about things that need no argument. A source citation is what it is. A source is what it is. Some people know what they are. Some don't. Those that don't are wrong and it's silly to keep fighting with them. Concepts have reality behind them, and the idea that we get to democratically vote on what they mean every time a new committee is formed is just plain silly. If BG can't get beyond this it is doomed twice over. (I am venting here, not criticizing you.)

Tom Wetmore

Comments